Minimal discrete matches for target grain size distributions

Author
Affiliation

Lorne Arnold

University of Washington Tacoma

Abstract

Soil is fundamentally a discrete material that is, nevertheless, commonly modeled as a continuum in part because of the computational expense of large-scale discrete element models (DEMs). Even at lab specimen scales, DEM’s computational cost may be substantial depending on the grain sizes being modeled. Despite these limitations, discrete models have proven useful in furthering our understanding of soil mechanics because they can spontaneously replicate realistic soil behavior as an emergent macro-scale property from a collection of particles following relatively simple interaction rules. This makes DEM an attractive tool for a multi-scale modeling approach where the constitutive behavior of a representative volume element (RVE) is characterized with a discrete model and applied at a larger scale through a continuum model. The ability of the RVE to represent a soil depends strongly on an appropriate grain size distribution match. However, in order to achieve computationally feasible models, even at lab scales, DEM simulations often use larger minimum particle sizes and more uniform distributions than their intended targets. Intuitively, discrete matches of different grain size distributions (GSDs) will require vastly different numbers of particles. But the precise relationship between GSD characteristics and the number of particles needed to match the distribution (and by extension the associated computational cost associated) is not intuitive. In this paper, we present the minimal packing set (MPS) concept. The minimal packing set is the smallest set of discrete particles needed to match a given GSD. We present a method for determining the MPS for any GSD and discuss strategies for finding the smallest MPS within a set of tolerances on the GSD. A mapping of USCS classification and MPS reveals a broad distribution of computational cost over several orders of magnitude for granular soils.

Introduction

Soil is a fundamentally discrete granular material whose complex mechanical behavior emerges from the numerous interactions between its constituent parts. The discrete element method (DEM), introduced by Cundall and Strack (1979), has proven to be a powerful tool in exploring the inter-particle interactions of granular assemblies. With DEM, a vast parametric space exists where grain sizes, shapes, contact models, etc. can be systematically varied and their influences on macro-scale assembly behavior can be quantified.

These models pose several challenges that must be managed in order for their benefits to be realized. Due to the number of elements involved in typical DEM simulations, both the computational resources needed to run them and the ability to characterize and interpret them are non-trivial. Over the years, the number of particles used in DEM simulations has increased, but not proportionally to the reduction in costs of computational resources. O’Sullivan (2014) showed that the average number of particles in DEM simulations increased substantially from 1998 to 2014, from approx. 103 to 105, but fell far short of the 107 predicted by Cundall (2001) for “easy” geomechanics problems by 2011. More recent publications in DEM have used particle numbers on the order of 106 (Sufian et al. (2021), Dong, Yan, and Cui (2022)) and in limited cases on the order of 108 ( Fang et al. (2021), Zhang et al. (2024)).

To some extent, the number of particles used to study geomechanics may simply be limited by the number of particles needed to capture the behavior of interest. For example, studies have shown that for certain combinations of grain size distribution (GSD) and density, there may be significant portions of the soil mass that are not mechanically engaged with the soil matrix. This may lead to thresholds of particle size that can be omitted from a given simulation without significant impact on the behavior being studied. This depends, of course, on what the behavior of interest is. DEM studies focusing on shearing resistance will have different needs than those focused on permeability, for example.

Regardless of the application, DEM models need to have a sufficient resolution of particles to create a representative volume element (RVE). Several nuances to the RVE concept exist, but broadly speaking, the RVE represents the smallest volume of material to exhibit statistically consistent macro-scale behavior as its source material. In physical laboratory experiments and discrete numerical experiments, study samples sufficiently large to behave as an RVE is critical to the broader applicability of the results.

The RVE depends on several factors including the GSD and the behavior of interest. Samples with larger maximum particle sizes will have larger RVEs. Depending on the GSD, loading conditions, and behavior of interest, the minimum particle size expected to participate mechanically in the soil matrix may be substantially smaller than the largest particle size. Intuitively then, with increasing GSD breadth, the number of particles needed in a DEM simulation of granular assemblies will also increase. The ratio of minimum and maximum particle sizes is, therefore, a significant and often reported parameter in describing DEM models. Often, matching the true GSD of interest (e.g., from a physical soil sample) is sacrificed for computational efficiency, for example by upscaling the target GSD (Zeraati-Shamsabadi and Sadrekarimi 2025).

While the insights we gain from DEM simulations with upscaled or truncated GSDs are valuable, there are acknowledged limitations imposed during scaling. Several questions around the limitations of DEM scaling are addressed by individual studies. First among these questions is whether a RVE-scale model has been achieved. But another important question exists around the computational cost savings associated with DEM scaling. Framed another way, an important question in experimental design could be: How computationally expensive would an unscaled DEM simulation of the material of interest be?

A brute force approach to answering this question certainly exists: build an unscaled DEM model and track the time required to run a small simulation with it. The obvious disadvantage of this approach is that finding the answer may be computationally expensive itself. A preferred approach would be an analytical solution that provides the number of particles of various grain sizes needed to reproduce any given GSD.

This paper introduces such an approach through the minimal discrete match (MDM) concept. The minimal packing set is the smallest set of discrete particles needed to match a given grain size distribution by volume. In order to characterize the MDM, descriptions of grain size distributions and granular samples are presented as mathematical sets and the rules describing several relationships between the sets are defined. Using these definitions, a simple algorithm is shown to efficiently identify the MDM. The results show a broad range of MDM magnitudes over several GSDs of interest in geotechnical engineering. The MDM algorithm is implemented in an open-source Python module, available on GitHub.

Discrete definitions

Identifying the mimimal discrete match requires rigorous mathematical definitions for particulate samples, \(S\), and grain size distributions, \(G\). Conceptually, each of these are collections of items (i.e., sets) with specific restrictions.

Sample definition

A sample, \(S\) is an indexed set describing its particles as pairs of size, \(X_S\), and quantity, \(Q_S\). \(X_S\) is an ordered (i.e., each entry is larger than the previous) set of unique values of positive real numbers. \(Q_s\) is an unordered set of positive integers related to \(X_S\) by a shared indexing set for the sample, \(I_S\).

\[S = \{(X, Q) \in I_S\} \tag{1}\] where:

\[X_S = \{x_i : i \in I_S\} \text{ with } x_1 < x_2 < \cdots < x_{n_S} \tag{2}\]

\[ Q = \{q_i : i \in I_S\} \tag{3}\]

\[ X_S \subset \mathbb{R}^+ \text{ and } Q \subset \mathbb{Z}^+ \tag{4}\]

Grain size distribution definition

A grain size distribution, \(G\), has a similar structure. It is and indexed set describing size boundaries, \(X_G\), and masses, \(M_G\). Like \(X_S\), \(X_G\) is an ordered set of unique values, however, it’s domain also includes zero (representing the pan in a typical sieve analysis). Like \(Q_S\), \(M_G\) shares an index (\(I_G\)) with the size set, but it’s domain is not restricted to integers, only to non-negative real numbers. The values in \(M_G\) represent the masses retained on a sieve with opening size \(X_G\).

\[G = \{(X_G, M_G) \in I_G\} \tag{5}\] where:

\[ X_G = \{x_j : j \in I_G\} \text{ where } x_1 < x_2 < \cdots < x_{n_G} \tag{6}\]

\[ M = \{m_j : j \in I_G\} \tag{7}\]

\[ X_G \subset \mathbb{R}^{non-neg} \text{ ; } V \subset \mathbb{R}^{non-neg} \tag{8}\]

Comparison definition

The relationship between \(S\) and \(G\) cad be defined in terms of how \(G\) describes \(S\). In the context of DEM modeling, this may seem backward because typically in DEM modeling, a target GSD is defined first and a sample generated to match it. On the other hand, a discrete sample (either physical or numerical) exists on its own, whereas a GSD only has meaning when interpreted as a description of a sample. Therefore the relationship between \(S\) and \(G\) will be defined in terms of \(G\)’s description of \(S\). In a later section, the topic of finding an instance of \(S\) that matches a target \(G\) will be addressed.

The relationship between \(S\) and \(G\) is formalized in three conditions described as follows:

Condition 1: \(G\) is complete if and only if the final entry in \(M_G\) is zero: \[G \text{ is complete } \Leftrightarrow m_{n_G} = 0 \tag{9}\]

This condition ensures that \(x_{nG}\) provides an upper bound to the sizes described by \(G\). This is analogous to the limitation of scope in ASTM D6913 to grain sizes passing a 75-mm sieve (D18 Committee, n.d.).

Condition 2: \(G\) describes \(S\) (denoted \(G \longrightarrow S\)) if and only if \(\text{Cond1}\) is met and the its smallest size is smaller than any size in \(S\) and its largest size is larger than any size in \(S\):

\[ G \longrightarrow S \Leftrightarrow \text{ Cond1} \land \min(X_G) < \min(X_S) \land \max(X_G) \ge \max(X_S) \tag{10}\]

Condition 3: \(G\) describes \(S\) articulately if \(\text{Cond2}\) is met and for all consecutive pairs of sizes in \(X_S\) there exists at least one size in \(X_G\) between them.

\[ G \stackrel{\text{articulately}}{\longrightarrow} S \Leftrightarrow \text{ Cond2} \land \forall (x_i, x_{i+1}) \in X_S \space \exists \space (x_i < x_j \in X_G \le x_{i+1}) \tag{11}\]

Condition 4: \(G\) describes \(S\) accurately if \(\text{Cond2}\) is met and the combined masses of all the particles with sizes between every pair of sizes in \(X_G\) is equal to the retained mass on the lower of the size pairs in \(G\):

\[ G \stackrel{\text{accurately}}{\longrightarrow} S \Leftrightarrow \text{ Cond2} \land \sum_{\substack{i \in I_S \\ x_j < x_i \le x_{j+1}}} q_i \cdot f(x_i) = m_j \tag{12}\]

where \(f(x)\) is a scaling function that converts size to mass. As indicated in Equation 12, a size in \(X_S\) is considered between two sizes in \(X_G\) if it is larger than the \(x_j\) and smaller than or equal to \(x_{j+1}\). This is physically analogous to assuming a particle will pass through an opening equal to its own size.

Note that \(\text{Cond3}\) is not required for \(\text{Cond4}\) to be satisfied. Equation 8

Physical interpretation

In physical terms, some assumptions are required in order for \(S\) and \(G\) to be interpreted as a physical sample and sieve analysis (or a numerical version of the same). First, the sizes in \(X_G\) are assumed to represent smallest enclosing diameter of whatever shapes the particles in \(S\) are. Second, the particles in \(S\) are assumed to be of the same shape such that a single scaling function, \(f(x)\), can represent the size-mass relationship for all of \(S\). For spherical particles with of constant density (\(\rho\)), this is a straightforward definition:

\[f(x) = \frac{\rho\pi}{6}x^3\]

Note, however, that the use of this particular scaling function is not required in general. It is only necessary that \(f(x)\) be some mapping of \(x \mapsto m\). It would be possible to combine sets of differently shaped particles with respective mapping functions, but this paper will limit itself to assuming uniform, spherical particles.

Grain size distributions considered

This paper will analyze several grain size distributions in the range of interest in geotechnical engineering (although the methods are applicable over a wider range). Specifically, randomized GSDs based on normal distributions across the sieve set specified in ASTM D6913 (14 opening sizes from 0.075-mm to 75-mm). Table 1 summarizes the GSDs used in this study using several traditional characteristics including coefficient of curvature, \(C_c\), and coefficient of uniformity, \(C_u\). The grain size index, \(I_{GS}\), as defined by Erguler (2016) and a new parameter, the curvature index, \(I_C\), are also presented. The curvature index is conceptually similar to the grain size index in that it is a ratio of areas related to the cumulative distribution function (in percent passing, log size space) of the GSD. \(I_C\) normalizes the area under the GSD curve to the trapezoidal area bounded by the percent passing the smallest size above the pan (\(x_{min+1}\)) and the largest size (\(x_{max}\)). Figure 1 shows a small subset of the GSDs considered and illustrates the definition of \(I_C\) on an example GSD.

Some traditional grain size distribution characteristics

Table 1: Grain size distribution parameters in this study
Parameter Value(s) in study
\(x_{min+1}\) 0.075 to 25 mm
\(x_{max}\) 9.5 to 75 mm
# of sieves 5 to 14
# of \(G\) instances 2750
\(C_c\) 0.2 to 25.5
\(C_u\) 1.2 to 214.5
\(I_{GS}\) 0.03 to 0.60
\(I_C\) 0.29 to 1.57
Figure 1: Grain size distributions evaluated. The area ratio defining the curvature index is shown. The example curve has a curvature index of 1.09.

Solving for the minimal discrete match

For any given \(S\), finding \(G\) that accurately describes \(S\) is no more complicated that following the grain size analysis procedure described in ASTM D6913 (D18 Committee, n.d.). The reverse, finding a solution for \(S\) that is accurately described by some target \(G\), is non-trivial. One of the reasons is that unlike a grain size analysis, the sizes of the solution (\(X_S\)) are bounded, but not explicitly defined. Additionally, the goal for this procedure is not only to find any \(S\) that is accurately described by \(G\), but the minimal packing set \(S_{min}\) that is accurately described by \(G\).

As a starting point, assuming that \(G\) is an articulate description of \(S\) and selecting any valid values for \(X_S\), an indexed set of mass ratios, \(\Phi\), and volume ratios, \(Z\) can be defined:

\[\Phi = \left\{\frac{m_j}{m_{n_G-1}} : j \in I_G\right\} \text{; with elements } \phi_j\]

\[Z= \left\{\frac{f(x_i)}{f(x_{n_S})} : i \in I_S\right\} \text{; with elements } \zeta_i\]

The mass ratio \(\Phi\) describes the masses in \(M_G\) relative the the mass of the largest size retaining mass. Consequently, \(\phi_{nG-1}\) will always equal 1. The volume ratio \(Z\) describes the relative volume of each particle size to that of the largest particle size in \(S\). Consequently, \(\zeta_{nS}\) will always equal 1. Note that \(Z\) is referred to as a volume ratio to avoid convusion with \(\Phi\) despite the fact that the mapping function \(f(x)\) converts size to mass. Interpreting \(Z\) as a volume ratio is valid under a constant density assumption.

Since \(\Phi\) describes relative masses of each size and \(Z\) describes the relative volume (and mass) per particle of each size, a quantity ratio, \(K\), can be defined by the ratio of \(\Phi\) to \(Z\).

\[K= \left\{\frac{\phi_i}{\zeta_{n_S}} : i \in I_S\right\} \text{; with elements } \kappa_i\]

The quantity ratio \(K\) is the relative number of particles of each size for any \(S\) accurately described by \(G\). It is directly proportional to the key target parameter, \(Q\). Unfortunately, with the exception of \(\kappa_{nS}\), the entries in \(K\) are not guaranteed to be integers, which is a requirement for \(Q\). Rounding each element in \(K\) to the nearest integer provides an approximate solution for the minimal discrete match.

\[ S_{approx} = \{(X, \text{int}(K)) \in I_S \approxeq S_{min}\} \]

Checking \(\text{Cond4}\) against \(S_{approx}\) and \(G\) shows how much error there is in the approximate solution. Depending on the application and acceptable error tolerance, \(S_{approx}\) may be a suitable substitute for \(S_{min}\). If a closer fit is needed, iteratively multiplying \(K\) by integers greater than 1 can be used to reduce the error to tolerable levels.

But since the sizes in \(X_S\) are only bounded by \(X_G\) and not explicitly defined, the opportunity exists to find a better selection of entries in \(X_S\) that will minimize \(Q_S\). This approach is equivalent to dropping the articulate description (\(\text{Cond3}\)) requirement from the initial attempt.

The best allowable sizes for \(X_S\) can be found using the following steps:

  1. Find the quantity ratios associated with the minimum (\(-\)) and maximum (\(+\)) allowable sizes in all but the largest size in \(X_S\) (i.e. \(K_-\) and \(K_+\)).
  2. Check whether an integer is contained between each entry in \(K_-\) and \(K_+\).
  3. If not, iteratively increase \(K_-\) and \(K_+\) until the span between each entry contains an integer.
  4. When an integer quantity exists between each entry in \(K_-\) and \(K_+\), these integers can be used to populate the spanned integer quantity ratio, \(K_{SI}\).
  5. Invert the mass scaling function \(f(x)\) on the ratio \(\Phi\) / \(K_{SI}\) to identify the compatible sizes for the spanned integer quantities: in \(X_S\).

\[ X_{SI} = f^{-1} \left( \frac{\Phi}{K_{SI}} \right) \tag{13}\]

Because \(K_{SI}\) was generated using the allowable sizes of \(X\) and the mass ratios of the target \(G\), the resulting \(X_{SI}\) satisfies \(\text{Cond4}\) with \(K_{SI}\) as the smallest possible quantity of particles, or the minimal discrete match:

\[ S_{min} = \{X_{SI}, K_{SI}\} \]

The effectiveness of the spanned integer algorithm is shown in the rapid convergence to numerical zero error compared to a fixed-size approximation in Figure 2. The figure shows both approaches attempting to find a suitable sample for each of the GSDs shown in Figure 1 with an allowable error tolerance of 1 percent. In nearly all cases, the spanned integer approach requires no iteration. Where iteration is needed, it converges to numerical zero within one or two iterations. The fixed size approach converges with between 1 and 15 iterations and retains measurable error.

Figure 2: Packing Algorithm Convergence

Note that this solution for the minimal packing set is dependent on the selected value for \(x_{n,S}\). Because the maximum particle size, \(x_{n,S}\) controls the packing set, and a smaller value for \(x_{n,S}\) will tend to minimize the set, if the full range of possible \(x_{n,S}\) is considered, \(x_{n,S}\) and \(x_{n-1,S}\) will tend to converge. The result of that convergence would be equivalent to the solution for a target \(G\) with \(x_{n-1,S} \approxeq x_{n,S} = x_{n-1,G}\) and the \(x_{n-1,G}\) sieve removed.

Characteristics of the MDM

The MDM increases with increasing mass ratio. The most critical mass ratio is smallest size to largest size.

The same is true of the volume ratio, which spans a larger range than volume ratio. This is an expected result. It was already mentioned that dmin/dmax is a commonly reported parameter for this reason.

?@fig-phi_N illustrates the relationship between MR and MDM size as well as vol ratio with color.

While not unexpected, the dependence on mass and volume ratios can be quantified through the MDM concept.

Slightly less intuitive is the relationship between curvature index and MDM size, which is also positive.

?@fig-curvature_N shows the relationship between I_C and MDM size. C_c, C_u, and I_GS were found to not correlate directly with MDM size. However, I_GS is helpful for visualizing the breadth of correlation between MDM size and uscs classifications. ?@fig-uscs_N shows the range of MDMs associated with the granular uscs classifications in this study.

/var/folders/q0/kxmqm5c95n7cxc6mmqklw1k40000gq/T/ipykernel_28978/1223607840.py:41: RuntimeWarning: invalid value encountered in log10
  "shape_factor": np.log10(
Figure 3: Change in total number of MPS Particles with mass ratio.
Figure 4: Change in total number of MPS Particles with curvature index.
Figure 5: USCS classifications of the MPS data.

Implications for DEM modeling

The algorithms described above can 1. Find a first-order approximation of the minimal packing. 2. Quantify the error in approximate minimal packings. 3. Find a rigorous solution minimal packing.

Conclusions

The rigorous minimal packing may be significantly larger than the first-order approximation or some other packing set within some acceptable tolerance. Whether a rigorous or approximate solution is needed depends very much on the application. But these concepts can be used to efficiently quantify the computational cost of rigor in the \(S:G\) match.

Insert a plot showing a realistic GSD and the sieve-specific error trend with increasing N. I could draw a typical GSD with a dual axis with N on the right. There would be several size-N lines with a colorscale indicating their error? or I could plot several GSDs and color by N and use the other axis for the error.

They can also be used to evaluate the computational cost of approximate and rigorous minimal packings for different USCS classifications.

Insert a plot showing error v. N for granular USCS soils.

References

Cundall, P. A. 2001. “A Discontinuous Future for Numerical Modelling in Geomechanics?” Proceedings of the Institution of Civil Engineers - Geotechnical Engineering 149 (1): 41–47. https://doi.org/10.1680/geng.2001.149.1.41.
Cundall, P. A., and O. D. L. Strack. 1979. “A Discrete Numerical Model for Granular Assemblies.” Géotechnique 29 (1): 47–65. https://doi.org/10.1680/geot.1979.29.1.47.
D18 Committee. n.d. “Test Methods for Particle-Size Distribution (Gradation) of Soils Using Sieve Analysis.” https://doi.org/10.1520/d6913_d6913m-17.
Dong, Youkou, Dingtao Yan, and Lan Cui. 2022. “An Efficient Parallel Framework for the Discrete Element Method Using GPU.” Applied Sciences 12 (6): 3107. https://doi.org/10.3390/app12063107.
Erguler, Zeynal Abiddin. 2016. “A Quantitative Method of Describing Grain Size Distribution of Soils and Some Examples for Its Applications.” Bulletin of Engineering Geology and the Environment 75 (2): 807–19. https://doi.org/10.1007/s10064-015-0790-1.
Fang, Luning, Ruochun Zhang, Colin Vanden Heuvel, Radu Serban, and Dan Negrut. 2021. “Chrono::GPU: An Open-Source Simulation Package for Granular Dynamics Using the Discrete Element Method.” Processes 9 (10): 1813. https://doi.org/10.3390/pr9101813.
O’Sullivan, C. 2014. “Advancing Geomechanics Using DEM.” In, edited by Kenichi Soga, Krishna Kumar, Giovanna Biscontin, and Matthew Kuo, 21–32. CRC Press. http://www.crcnetbase.com/doi/abs/10.1201/b17395-4.
Sufian, Adnan, Marion Artigaut, Thomas Shire, and Catherine O’Sullivan. 2021. “Influence of Fabric on Stress Distribution in Gap-Graded Soil.” Journal of Geotechnical and Geoenvironmental Engineering 147 (5): 04021016. https://doi.org/10.1061/(ASCE)GT.1943-5606.0002487.
Zeraati-Shamsabadi, Mohammad, and Abouzar Sadrekarimi. 2025. “A DEM Study on the Effects of Specimen and Particle Sizes on Direct Simple Shear Tests.” Granular Matter 27 (2). https://doi.org/10.1007/s10035-025-01513-y.
Zhang, Ruochun, Bonaventura Tagliafierro, Colin Vanden Heuvel, Shlok Sabarwal, Luning Bakke, Yulong Yue, Xin Wei, Radu Serban, and Dan Negruţ. 2024. “Chrono DEM-Engine: A Discrete Element Method Dual-GPU Simulator with Customizable Contact Forces and Element Shape.” Computer Physics Communications 300 (July): 109196. https://doi.org/10.1016/j.cpc.2024.109196.